AITopics | super learner

Collaborating Authors

super learner

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Automated machine learning: AI-driven decision making in business analytics

Schmitt, Marc

arXiv.org Artificial IntelligenceJun-3-2025

The realization that AI-driven decision-making is indispensable in today's fast-paced and ultra-competitive marketplace has raised interest in industrial machine learning (ML) applications significantly. The current demand for analytics experts vastly exceeds the supply. One solution to this problem is to increase the user-friendliness of ML frameworks to make them more accessible for the non-expert. Automated machine learning (AutoML) is an attempt to solve the problem of expertise by providing fully automated off-the-shelf solutions for model choice and hyperparameter tuning. This paper analyzed the potential of AutoML for applications within business analytics, which could help to increase the adoption rate of ML across all industries. The H2O AutoML framework was benchmarked against a manually tuned stacked ML model on three real-world datasets. The manually tuned ML model could reach a performance advantage in all three case studies used in the experiment. Nevertheless, the H2O AutoML package proved to be quite potent. It is fast, easy to use, and delivers reliable results, which come close to a professionally tuned ML model. The H2O AutoML framework in its current capacity is a valuable tool to support fast prototyping with the potential to shorten development and deployment cycles. It can also bridge the existing gap between supply and demand for ML experts and is a big step towards automated decisions in business analytics. Finally, AutoML has the potential to foster human empowerment in a world that is rapidly becoming more automated and digital.

artificial intelligence, dataset, machine learning, (14 more...)

arXiv.org Artificial Intelligence

doi: 10.1016/j.iswa.2023.200188

2205.10538

Country: Europe > United Kingdom (0.28)

Genre: Research Report > New Finding (0.47)

Industry:

Banking & Finance > Credit (0.48)
Education > Curriculum > Subject-Specific Education (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.47)

Add feedback

A Density Ratio Super Learner

Wu, Wencheng, Benkeser, David

arXiv.org Machine LearningAug-8-2024

The estimation of the ratio of two density probability functions is of great interest in many statistics fields, including causal inference. In this study, we develop an ensemble estimator of density ratios with a novel loss function based on super learning. We show that this novel loss function is qualified for building super learners. Two simulations corresponding to mediation analysis and longitudinal modified treatment policy in causal inference, where density ratios are nuisance parameters, are conducted to show our density ratio super learner's performance empirically.

density ratio, learner, super learner, (16 more...)

arXiv.org Machine Learning

2408.04796

Country: North America > United States > Georgia > Fulton County > Atlanta (0.04)

Genre: Research Report > New Finding (0.48)

Industry: Health & Medicine (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)

Add feedback

Pseudo-Observations and Super Learner for the Estimation of the Restricted Mean Survival Time

Cwiling, Ariane, Perduca, Vittorio, Bouaziz, Olivier

arXiv.org Machine LearningApr-26-2024

In the context of right-censored data, we study the problem of predicting the restricted time to event based on a set of covariates. Under a quadratic loss, this problem is equivalent to estimating the conditional Restricted Mean Survival Time (RMST). To that aim, we propose a flexible and easy-to-use ensemble algorithm that combines pseudo-observations and super learner. The classical theoretical results of the super learner are extended to right-censored data, using a new definition of pseudo-observations, the so-called split pseudo-observations. Simulation studies indicate that the split pseudo-observations and the standard pseudo-observations are similar even for small sample sizes. The method is applied to maintenance and colon cancer datasets, showing the interest of the method in practice, as compared to other prediction methods. We complement the predictions obtained from our method with our RMST-adapted risk measure, prediction intervals and variable importance measures developed in a previous work.

algorithm, learner, super learner, (16 more...)

arXiv.org Machine Learning

2404.17211

Country: Europe > France > Île-de-France > Paris > Paris (0.04)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.92)

Industry:

Health & Medicine > Therapeutic Area > Oncology (0.68)
Health & Medicine > Pharmaceuticals & Biotechnology (0.67)

Technology:

Information Technology > Modeling & Simulation (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)

Add feedback

Misconduct in Post-Selections and Deep Learning

Weng, Juyang

arXiv.org Artificial IntelligenceFeb-13-2024

This is a theoretical paper on "Deep Learning" misconduct in particular and Post-Selection in general. As far as the author knows, the first peer-reviewed papers on Deep Learning misconduct are [32], [37], [36]. Regardless of learning modes, e.g., supervised, reinforcement, adversarial, and evolutional, almost all machine learning methods (except for a few methods that train a sole system) are rooted in the same misconduct -- cheating and hiding -- (1) cheating in the absence of a test and (2) hiding bad-looking data. It was reasoned in [32], [37], [36] that authors must report at least the average error of all trained networks, good and bad, on the validation set (called general cross-validation in this paper). Better, report also five percentage positions of ranked errors. From the new analysis here, we can see that the hidden culprit is Post-Selection. This is also true for Post-Selection on hand-tuned or searched hyperparameters, because they are random, depending on random observation data. Does cross-validation on data splits rescue Post-Selections from the Misconducts (1) and (2)? The new result here says: No. Specifically, this paper reveals that using cross-validation for data splits is insufficient to exonerate Post-Selections in machine learning. In general, Post-Selections of statistical learners based on their errors on the validation set are statistically invalid.

misconduct, post-selection, validation, (12 more...)

arXiv.org Artificial Intelligence

2403.00773

Country:

Europe > Russia (0.14)
Asia > Russia (0.14)
North America > United States > New York (0.04)
(13 more...)

Genre: Research Report (0.84)

Industry:

Leisure & Entertainment (0.93)
Health & Medicine > Therapeutic Area (0.47)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

SLEM: Machine Learning for Path Modeling and Causal Inference with Super Learner Equation Modeling

Vowels, Matthew J.

arXiv.org Machine LearningJan-2-2024

Causal inference is a crucial goal of science, enabling researchers to arrive at meaningful conclusions regarding the predictions of hypothetical interventions using observational data. Path models, Structural Equation Models (SEMs), and, more generally, Directed Acyclic Graphs (DAGs), provide a means to unambiguously specify assumptions regarding the causal structure underlying a phenomenon. Unlike DAGs, which make very few assumptions about the functional and parametric form, SEM assumes linearity. This can result in functional misspecification which prevents researchers from undertaking reliable effect size estimation. In contrast, we propose Super Learner Equation Modeling, a path modeling technique integrating machine learning Super Learner ensembles. We empirically demonstrate its ability to provide consistent and unbiased estimates of causal effects, its competitive performance for linear models when compared with SEM, and highlight its superiority over SEM when dealing with non-linear relationships. We provide open-source code, and a tutorial notebook with example usage, accentuating the easy-to-use nature of the method.

artificial intelligence, bayesian inference, machine learning, (18 more...)

arXiv.org Machine Learning

2308.04365

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > United States > New York (0.04)
Europe > Switzerland > Vaud > Lausanne (0.04)
(3 more...)

Genre: Research Report > New Finding (1.00)

Industry: Health & Medicine (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis (0.68)

Add feedback

Modelling wildland fire burn severity in California using a spatial Super Learner approach

Simafranca, Nicholas, Willoughby, Bryant, O'Neil, Erin, Farr, Sophie, Reich, Brian J, Giertych, Naomi, Johnson, Margaret, Pascolini-Campbell, Madeleine

arXiv.org Artificial IntelligenceNov-25-2023

Given the increasing prevalence of wildland fires in the Western US, there is a critical need to develop tools to understand and accurately predict burn severity. We develop a machine learning model to predict post-fire burn severity using pre-fire remotely sensed data. Hydrological, ecological, and topographical variables collected from four regions of California - the sites of the Kincade fire (2019), the CZU Lightning Complex fire (2020), the Windy fire (2021), and the KNP Fire (2021) - are used as predictors of the difference normalized burn ratio. We hypothesize that a Super Learner (SL) algorithm that accounts for spatial autocorrelation using Vecchia's Gaussian approximation will accurately model burn severity. In all combinations of test and training sets explored, the results of our model showed the SL algorithm outperformed standard Linear Regression methods. After fitting and verifying the performance of the SL model, we use interpretable machine learning tools to determine the main drivers of severe burn damage, including greenness, elevation and fire weather variables. These findings provide actionable insights that enable communities to strategize interventions, such as early fire detection systems, pre-fire season vegetation clearing activities, and resource allocation during emergency responses. When implemented, this model has the potential to minimize the loss of human life, property, resources, and ecosystems in California.

base learner, burn severity, prediction, (13 more...)

arXiv.org Artificial Intelligence

2311.16187

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.14)
North America > United States > North Carolina (0.04)
North America > United States > Nevada (0.04)
(8 more...)

Genre: Research Report (1.00)

Industry:

Government > Regional Government > North America Government > United States Government (0.93)
Law Enforcement & Public Safety (0.93)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Practical considerations for variable screening in the Super Learner

Williamson, Brian D., King, Drew, Huang, Ying

arXiv.org Machine LearningNov-6-2023

Estimating a prediction function is a fundamental component of many data analyses. The Super Learner ensemble, a particular implementation of stacking, has desirable theoretical properties and has been used successfully in many applications. Dimension reduction can be accomplished by using variable screening algorithms, including the lasso, within the ensemble prior to fitting other prediction algorithms. However, the performance of a Super Learner using the lasso for dimension reduction has not been fully explored in cases where the lasso is known to perform poorly. We provide empirical results that suggest that a diverse set of candidate screening algorithms should be used to protect against poor performance of any one screen, similar to the guidance for choosing a library of prediction algorithms for the Super Learner.

artificial intelligence, machine learning, super learner, (16 more...)

arXiv.org Machine Learning

2311.03313

Country: North America > United States > Oklahoma > Payne County > Cushing (0.04)

Genre: Research Report (1.00)

Industry: Health & Medicine > Therapeutic Area > Immunology (0.69)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.50)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.47)

Add feedback

Personalised dynamic super learning: an application in predicting hemodiafiltration's convection volumes

Chatton, Arthur, Bally, Michèle, Lévesque, Renée, Malenica, Ivana, Platt, Robert W., Schnitzer, Mireille E.

arXiv.org Machine LearningOct-12-2023

Obtaining continuously updated predictions is a major challenge for personalised medicine. Leveraging combinations of parametric regressions and machine learning approaches, the personalised online super learner (POSL) can achieve such dynamic and personalised predictions. We adapt POSL to predict a repeated continuous outcome dynamically and propose a new way to validate such personalised or dynamic prediction models. We illustrate its performance by predicting the convection volume of patients undergoing hemodiafiltration. POSL outperformed its candidate learners with respect to median absolute error, calibration-in-the-large, discrimination, and net benefit. We finally discuss the choices and challenges underlying the use of POSL.

artificial intelligence, learner, machine learning, (19 more...)

arXiv.org Machine Learning

2310.08479

Country:

North America > Canada > Quebec > Montreal (0.15)
North America > United States > New York > New York County > New York City (0.14)
Oceania > Australia > Victoria > Melbourne (0.04)
(3 more...)

Genre: Research Report > Experimental Study (1.00)

Industry:

Health & Medicine > Therapeutic Area > Nephrology (1.00)
Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (1.00)
Health & Medicine > Epidemiology (0.95)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.97)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.93)

Add feedback

A Huber loss-based super learner with applications to healthcare expenditures

Wu, Ziyue, Benkeser, David

arXiv.org Machine LearningMay-13-2022

Complex distributions of the healthcare expenditure pose challenges to statistical modeling via a single model. Super learning, an ensemble method that combines a range of candidate models, is a promising alternative for cost estimation and has shown benefits over a single model. However, standard approaches to super learning may have poor performance in settings where extreme values are present, such as healthcare expenditure data. We propose a super learner based on the Huber loss, a "robust" loss function that combines squared error loss with absolute loss to down-weight the influence of outliers. We derive oracle inequalities that establish bounds on the finite-sample and asymptotic performance of the method. We show that the proposed method can be used both directly to optimize Huber risk, as well as in finite-sample settings where optimizing mean squared error is the ultimate goal. For this latter scenario, we provide two methods for performing a grid search for values of the robustification parameter indexing the Huber loss. Simulations and real data analysis demonstrate appreciable finite-sample gains in cost prediction and causal effect estimation using our proposed method.

artificial intelligence, learner, machine learning, (16 more...)

arXiv.org Machine Learning

2205.0687

Country: North America > United States > Maryland > Montgomery County > Rockville (0.04)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.68)
Research Report > Strength High (0.68)

Industry:

Health & Medicine > Health Care Providers & Services (0.68)
Health & Medicine > Therapeutic Area > Psychiatry/Psychology (0.68)

Technology:

Information Technology > Modeling & Simulation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)

Add feedback

Personalized Online Machine Learning

Malenica, Ivana, Phillips, Rachael V., Pirracchio, Romain, Chambaz, Antoine, Hubbard, Alan, van der Laan, Mark J.

arXiv.org Machine LearningSep-21-2021

In this work, we introduce the Personalized Online Super Learner (POSL) -- an online ensembling algorithm for streaming data whose optimization procedure accommodates varying degrees of personalization. Namely, POSL optimizes predictions with respect to baseline covariates, so personalization can vary from completely individualized (i.e., optimization with respect to baseline covariate subject ID) to many individuals (i.e., optimization with respect to common baseline covariates). As an online algorithm, POSL learns in real-time. POSL can leverage a diversity of candidate algorithms, including online algorithms with different training and update times, fixed algorithms that are never updated during the procedure, pooled algorithms that learn from many individuals' time-series, and individualized algorithms that learn from within a single time-series. POSL's ensembling of this hybrid of base learning strategies depends on the amount of data collected, the stationarity of the time-series, and the mutual characteristics of a group of time-series. In essence, POSL decides whether to learn across samples, through time, or both, based on the underlying (unknown) structure in the data. For a wide range of simulations that reflect realistic forecasting scenarios, and in a medical data application, we examine the performance of POSL relative to other current ensembling and online learning methods. We show that POSL is able to provide reliable predictions for time-series data and adjust to changing data-generating environments. We further cultivate POSL's practicality by extending it to settings where time-series enter/exit dynamically over chronological time.

algorithm, estimator, learner, (13 more...)

arXiv.org Machine Learning

2109.10452

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
Europe > Austria > Vienna (0.14)
North America > Trinidad and Tobago > Trinidad > Arima > Arima (0.05)
(4 more...)

Genre: Research Report (0.82)

Industry: Health & Medicine > Therapeutic Area (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.46)

Add feedback